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ELLIPTIC VARMA MODEL 

By Marc Hallin and Davy Paindaveine 1 

Universite Libre de Bruxelles 

We are deriving optimal rank-based tests for the adequacy of a 
vector autoregressive-moving average (VARMA) model with ellipti- 
cally contoured innovation density. These tests are based on the ranks 
of pseudo-Mahalanobis distances and on normed residuals computed 
from Tyler's [Ann. Statist. 15 (1987) 234-251] scatter matrix; they 
generalize the univariate signed rank procedures proposed by Hallin 
and Puri [J. Multivariate Anal. 39 (1991) 1-29]. Two types of op- 
timally properties are considered, both in the local and asymptotic 
sense, a la Le Cam: (a) (fixed-score procedures) local asymptotic min- 
imaxity at selected radial densities, and (b) (estimated-score proce- 
dures) local asymptotic minimaxity uniform over a class T of radial 
densities. Contrary to their classical counterparts, based on cross- 
covariance matrices, these tests remain valid under arbitrary ellip- 
tically symmetric innovation densities, including those with infinite 
variance and heavy-tails. We show that the AREs of our fixed-score 
procedures, with respect to traditional (Gaussian) methods, are the 
same as for the tests of randomness proposed in Hallin and Pain- 
daveine [Bernoulli 8 (2002b) 787-815]. The multivariate serial exten- 
sions of the classical Chernoff-Savage and Hodges-Lehmann results 
obtained there thus also hold here; in particular, the van der Waerden 
versions of our tests are uniformly more powerful than those based 
on cross-covariances. As for our estimated-score procedures, they are 
fully adaptive, hence, uniformly optimal over the class of innovation 
densities satisfying the required technical assumptions. 

1. Introduction. 

1.1. Multivariate signs and ranks. Much attention has been given re- 
cently to the development of invariant, distribution- free and robust methods 
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in the context of multivariate analysis. Whereas such concepts as medians, 
quantiles, ranks or signs have been present in the classical toolkit of uni- 
variate statistical inference for about half a century, the emergence of their 
multivariate counterparts has been considerably slower. 

A fairly complete theory of rank and sign methods for multivariate anal- 
ysis was elaborated in the sixties, culminating in the monograph by Puri 
and Sen (1971). This theory however suffers the major weakness of being 
based on componentwise definitions of ranks and signs, yielding procedures 
that heavily depend on the choice of a particular coordinate system. It took 
about twenty years to see the emergence of a systematic development of 
coordinate-free, affine-invariant competitors to these componentwise sign 
and rank methods. 

This development, initiated in the late eighties, essentially expanded along 
two distinct lines of research. The first one, based on the so-called Oja 
signs and ranks, is due to Oja, Hettmansperger and their collaborators 
[Mottonen and Oja (1995), Mottonen, Oja and Tienari (1997), Mottonen, 
Hettmansperger, Oja and Tienari (1998), Hettmansperger, Nyblom and Oja 
(1994) and Hettmansperger, Mottonen and Oja (1997); see Oja (1999) for 
a review]. The second one is associated with Randies' concept of interdi- 
rections, and was developed by Randies and his coauthors [Randies (1989), 
Peters and Randies (1990), Jan and Randies (1994) and Um and Randies 
(1998)]. For both groups of methods, only location problems (one- and two- 
sample problems, analysis of variance, . . . ) were considered, and optimality 
issues were not investigated. This problem of optimality has been addressed 
for the first time in Hallin and Paindaveine (2002a) who, still for the location 
problem, are constructing fully affine-invariant methods based on Randies' 
interdirections and the so-called pseudo-Mahalanobis ranks that are also 
fully efficient (in the Le Cam sense) for the multivariate one-sample loca- 
tion model. Invariance and robustness on one side, efficiency on the other, 
thus, should not be perceived as totally irreconcilable objectives. 

The case of multivariate time series problems, in this respect, is much 
worse, despite the recognized need for invariant, distribution-free and ro- 
bust methods in the area. The univariate context has been systematically 
explored, with a series of papers by Hallin, Ingenbleek and Puri (1985), 
Hallin and Melard (1988) and Hallin and Puri (1988, 1991, 1994) on rank 
and signed rank methods for autoregressive-moving average (ARMA) and 
a few other [see, e.g., Hallin and Werker (2003)] time series models. But, 
except for a componentwise rank approach [Hallin, Ingenbleek and Puri 
(1989), Hallin and Puri (1995)] to the problem of testing multivariate white 
noise against vector autoregressive-moving average (VARMA) dependence — 
suffering the same lack of affine-invariance as its nonserial counterparts — and 
an affine-invariant approach to the same problem based on interdirections 
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and pseudo-Mahalanobis ranks [Hallin and Paindaveine (2002b)], the mul- 
tivariate situation so far remains virtually unexplored from this point of 
view. 

Our objective is to develop a complete, fully operational theory of opti- 
mal signed rank tests for linear restrictions on the parameters of multire- 
sponse linear models with VARMA errors and unspecified elliptically sym- 
metric innovation densities. These tests are based on the ranks of pseudo- 
Mahalanobis distances and on normed standardized residuals computed from 
Tyler's (1987) scatter matrix. They generalize the univariate signed rank 
procedures proposed by Hallin and Puri (1991, 1994) (Tyler-normed resid- 
uals playing the role of multivariate signs). Two types of optimality prop- 
erties are considered, both in the local and asymptotic sense, a la Le Cam: 
(a) (fixed-score procedures) local asymptotic minimaxity at selected radial 
densities, and (b) (estimated-score procedures) local asymptotic minimax- 
ity, uniform over the set of all radial densities (satisfying adequate regularity 
assumptions). 

Fulfilling such an objective requires a series of steps, each of which plays 
an essential role in the construction of the final methods: 

(i) defining the adequate (asymptotically sufficient, in the Le Cam sense) 
rank-based measures of serial dependence, establishing the required asymp- 
totic representation and central-limit results and constructing the optimal 
tests for fully specified values of the parameters; this is the aim of the present 
paper, which also works out the algebraic problems related with the singu- 
larity of information matrices; 

(ii) characterizing the class of linear hypotheses that are invariant under 
linear transformations, and for which affine-invariant multivariate rank tests 
make sense (a problem that does not appear in one-dimensional setting); this 
characterization is obtained in Hallin and Paindaveine (2003); 

(iii) establishing the asymptotic linearity of the test statistics we are ob- 
taining in this paper [see (i) above]; this linearity is required if estimated 
residuals are to be substituted for the exact ones in the computation of 
multivariate ranks and signs, and is the subject of Hallin and Paindaveine 
(2004a); 

(iv) finally, obtaining the optimal aligned sign and rank tests for linear 
restrictions, with a detailed and explicit description of some important spe- 
cial cases such as testing for the orders of a VARMA error or a rank-based 
solution to the multivariate Durbin- Watson problem; see Hallin and Pain- 
daveine (2005). 

1.2. The benefits of a rank-based approach. Introducing ranks in multi- 
variate time series problems is not just a mathematically challenging exer- 
cise, or a matter of theoretical aesthetics. The benefits of rank-based meth- 



4 



M. HALLIN AND D. PAINDAVEINE 



ods indeed are multiple and "mutually orthogonal" in the sense that none 
of them is obtained at the expense of the others. 

If efficiency is the main objective, the generalized Chernoff-Savage and 
Hodges-Lehmann results of Section 7.2 are important selling points: the fact 
that asymptotic relative efficiencies with respect to daily-practice methods 
are never less than one, for instance, is not a small advantage. 

But there is much more. The tools we are using here were first devel- 
oped (in the simpler context of independent observations) in the robustness 
literature. Robustness (in the vague but reasonably well-understood sense 
of "resistance to outliers") and efficiency objectives, to a large extent, thus 
can be attained simultaneously. Invariance — whether strict (with respect to 
affine transformations) or approximate (with respect to order-preserving ra- 
dial transformations: see Section 4 for precise statements) — is another fun- 
damental feature of the methods we are proposing. A major consequence 
of invariance indeed is distribution-freeness, hence exact, similar, unbiased 
tests, even in the presence of heavy-tailed distributions. The methods we are 
developing thus remain valid under a very broad class of densities, whereas 
everyday practice requires finite second-order moments, often fourth-order 
ones. 

Due to the need for consistent estimation of nuisance parameters, some 
of these benefits of invariance (such as heavy-tail validity) unavoidably have 
to be tuned down when testing for linear restrictions on the parameters. 
One way around this problem would consist in modeling median or quantile 
(auto)regressions, but this is much beyond the scope of the present paper. 
Most of the nice consequences of invariance however remain, under approx- 
imate or asymptotic form. 

Finally, the methods proposed are fully applicable; see Hallin and Pain- 
daveine (2004b) for an application to VAR order identification. 

1.3. Outline of the paper. The starting point in this paper is a local 
asymptotic normality (LAN) result by Garel and Hallin (1995). This LAN 
result allows for deriving testing procedures that are locally and asymptot- 
ically optimal under a given innovation density /, based on a non-Gaussian 
form of cross-covariances, the residual f -cross- covariance matrices. 

However, due to the possibility of singular local information matrices [such 
singularities occur as soon as the VARMA(pi,gi) neighborhood of a null 
VARMA(pO)?o) model with po <p± and qo < q\ is considered], the optimal 
test statistics involve unpleasant generalized inverses, which darkens their 
asymptotic behavior. Therefore, we first restate the LAN property by rewrit- 
ing the central sequence in a way that explicitly involves the ranks (in the 
algebraic sense) of local information matrices, and allows for "generalized 
inverse-free" locally and asymptotically optimal procedures (see the com- 
ments after Proposition 1). Next, we show how to replace the "parametric" 
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residual /-cross-covariance matrices appearing in the central sequences with 
"nonpar ametric" versions, based on the ranks of the Mahalanobis distances 
and the estimated standardized residuals computed from Tyler's scatter ma- 
trix. 

Tyler's scatter matrix enjoys highly desirable equivariance/invariance prop- 
erties. These properties extend to our test statistics; in particular, they are 
asymptotically invariant under monotone radial transformations of the resid- 
uals, hence, asymptotically distribution-free with respect to the underlying 
radial density. They also are asymptotically distribution-free with respect to 
the scatter parameter; it should be stressed, however, that this latter prop- 
erty does not follow from any afnne-invariance property. Unlike the null 
hypotheses of location or randomness considered in Hallin and Paindaveine 
(2002a, b), hypotheses involving general VARMA dependence, as a rule, are 
not affine- invariant: see Hallin and Paindaveine (2003) for a precise charac- 
terization. Actually, our test statistics are strictly afnne-invariant whenever 
the underlying testing problems are, which is of course the most sensible 
afnne-invariance property we can hope for. 

We conclude the paper by computing the asymptotic relative efficiencies 
of the proposed nonparametric procedures with respect to the Gaussian 
ones, showing that the AREs, as well as the generalized Chernoff-Savage and 
Hodges-Lehmann theorems obtained in Hallin and Paindaveine (2002b), are 
still valid here. 

The paper is organized as follows. In Section 2 we describe the testing 
problem under study, and state the main assumptions to be made. The LAN 
structure of the model is established in Section 3, with a central sequence 
that exploits the assumptions of elliptical symmetry. The multivariate coun- 
terparts of traditional ranks and signs are based on Tyler's scatter ma- 
trix, the corresponding Tyler residuals and the so-called pseudo-Mahalanobis 
ranks. These concepts are defined in Section 4, where their consistency and 
invariance properties are also derived. They are used, in Section 5, in the 
definition of a concept of nonparametric residual cross-covariance matrices, 
extending to the multivariate context the notion of rank-based autocorre- 
lations developed in Hallin and Puri (1988, 1991). These matrices allow 
for a reconstruction of central sequences, hence, for nonparametric locally 
asymptotically optimal procedures. Two types of optimality properties are 
considered in Section 6, both in the local and asymptotic sense, a la Le Cam 
(we use the term "minimaxity" even though the tests are "maximin" rather 
than "minimax"): 

(a) (fixed-score procedures) local asymptotic minimaxity at selected ra- 
dial densities, and 

(b) (estimated-score procedures) local asymptotic minimaxity, uniform 
over a broad class of radial densities. 
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In both cases the proposed tests remain valid under arbitrary elliptically 
symmetric innovation densities, including those with infinite variance. In 
Section 7 the asymptotic relative efficiencies of the proposed procedures, 
with respect to their Gaussian counterparts (based on classical cross-covariances), 
are derived. Proofs are concentrated in the Appendix. 

2. Notation and main assumptions. Consider the VARMA (px, qi) model 
defined by the stochastic difference equation 



where Ai, . . . , A pi , Bi, . . . ,B ?1 are k x k real matrices (I& stands for the 
/c-dimensional identity matrix), L denotes the lag operator and {st\t G 
is an absolutely continuous fc-variate white noise process. The parameter 
of interest is 6 := ((vec Ai)', . . . , (vec A pi )', (vecBi)', . . . , (vecB^)')', with 
values in M fc2 (Pi+<?i). 
Fixing some value 



of the parameter that satisfies Assumption (A) below, we want to test 
the null hypothesis 6 = 6q against the alternative ^ 6q. Choosing po < pi 
and/or go < Qi allows one to test the adequacy of the specified VARMA 
coefficients in 9q, while contemplating the possibility of possibly higher- 
order VARMA models. If the order is not an issue, one can just let po = p\ 
and qo = qi ■ 

The null VARMA model must satisfy the usual causality and invertibility 
conditions. More precisely, we assume the following on Oq: 

Assumption (A) . All solutions of det(I fe - YaLi AjZ*) = 0, z G C, and 
det(I fc + E^iBije*) = 0, z G C (|A Po | + + |B go |), lie outside the unit ball 
in C. Moreover, the greatest common left divisor of — X)f=i AjZ* and 
Ife + EtiB^ 4 is I fc . 

Write A(L) and B(L) for the difference operators — X)?=i AjL* an d 
1^ + J2f=i HiL 1 , respectively. Letting Bo := recall that the Green's ma- 
trices H u , u G N, associated with B(L) are defined by the linear recursion 

YlT^$ q ° x H n _j = 5 u oIk, where 5 u o = 1 if u = 0, and 5 u o = otherwise. 
Assumption (A) also allows for defining these Green's matrices by 



(1) 




O := ((vecAi) / ,...,(vecA po ) / ,0 / fe2(pi _ po)xl , 
(vecBi)', . . . , {vecB qo y,0' k 2 {qi _ qo)xl )' 
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as a consequence, the same matrices also satisfy [*J B^H' u _j = 5 u oIk- 
Assumption (A), moreover, ensures the existence of some e > such that 
the series in (2) is absolutely and uniformly convergent in the closed ball 
in C with center and radius 1 + £. Consequently, ||H n ||(l +e) n goes to 
as u goes to infinity. This exponential decrease ensures that (||H u ||,-u £ N) 
belongs to Z P (N) for all p > 0, where l p (N) denotes the set of all sequences 
(x«, u £ N) for which J2^=o \ x uf < °o. Of course, the same remarks also hold, 
with obvious changes, for Green's matrices G u , u £ N, associated with the 
operator A(L). For simplicity, we do not indicate the strong dependence on 
0o of G u and H u , which of course are associated with the null operators 
A(L) and B(L). 

Under Assumption (A), the white noise {et} is {X 4 }'s innovation process. 
The set of Assumptions (B) deals with the density of this innovation. As 
indicated, we restrict ourselves to a class of elliptically symmetric densities. 

Assumption (Bl). Denote by 5] a symmetric positive definite k x k 
matrix, and by / : Mg" — > K + a nonnegative function, such that / > a.e. and 

Jo°° r k ~ 1 f(r) dr < oo. We will assume throughout that {e^ , . . . , } is a 
finite realization of an elliptic white noise process with scatter matrix I] and 
radial density /, that is, such that its probability density at (zi,...,z n ) £ 

R nk is of the form Ut=l /(^J s > /), where 

(3) /(z 1 ;E,/):=c fc , / (detE)- 1 / 2 /(||zi|| s ), zi £ R k . 

Here ||z||s := (z'Xl~ 1 z) 1//2 denotes the norm of x in the metric associated 
with S. The constant c^j is the normalization factor (u>k , where 

Wfc stands for the (A; — 1) -dimensional Lebesgue measure of the unit sphere 
S^ 1 C R k , and mj := J °° r'/(r) cir. 

Note that the scatter matrix S in (3) need not be (a multiple of) the 
covariance matrix of the observations, which may not exist, and that / is not, 
strictly speaking, a probability density; see Hallin and Paindaveine (2002a) 
for a discussion. Moreover, S and / are identified up to an arbitrary scale 
transformation only. More precisely, for any a > 0, letting 5] a := a 2 S and 
fa(r) ■= f(ar), we have /(x;E,/) =/(x;S a ,/ tt ). This will be harmless in 
the sequel since cross-covariances, central sequences, as well as all statistics 
considered, are insensitive to a variation of a. 

We will denote by X -1 / 2 the unique upper-triangular k x k array with 
positive diagonal elements that satisfies X -1 = (5] -1 / 2 )'S -1 / 2 . With this 

notation, ^-^e^ /W^^ef ||, . . . , jr 1 ^ /W^- 1 ' 2 ^ \\ are i.i.d., and 

uniformly distributed over S k ~ 1 . Similarly, ||S _1//2 e^||, . . . , ||5]~ 1 / 2 el n ^ || are 
i.i.d. with probability density function 

(4) AM^^.y)" 1 ^ 1 /^], 
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where Ie denotes the indicator function associated with the Borel set E. The 
terminology radial density will be used for / and fk indifferently (though 
only fk, of course, is a genuine probability density). We denote by Fk the 
distribution function associated with fk- 

Write TC^ (9q, I], /) for the hypothesis under which an observation X( n ) := 

(Xj n) , . . . ,xi n) )' is generated by the VARMA(p ,9o) model (1) with param- 
eter value 9q satisfying Assumption (A) and innovation process satisfying 
Assumption (Bl) with "scatter parameter" S and radial density /. Our 
objective is to test W(»>(0 O ) == Us U/^ re )(0 o , S, /) against U^« (n) (fl). 
Consequently, X! and / play the role of nuisance parameters; note that the 
unions, m the definition of n^{9 ), are taken over all possible values of XI 
and /. 

The methodology we will adopt in the derivation of optimality results 
is based on Le Cam's asymptotic theory of experiments. This requires the 
model (characterized by some fixed value of S and /) at which optimality 
is sought to be locally and asymptotically normal (LAN). LAN, of course, 
does not hold without a few, rather mild, regularity conditions: finite second- 
order moments and finite Fisher information for /, and quadratic mean 

differentiability of f 1 ^ 2 - These technical assumptions are taken into account 
in Assumptions (Bl') and (B2), in a form that is adapted to the elliptical 
context. We insist, however, on the fact that these assumptions are made on 
the density at which optimality is desired, not on the actual density. 

Assumption (Bl'). Same as Assumption (Bl), but we further assume 
that Hk+i-.f < °o- 

Note that Assumption (Bl') is also required when Gaussian procedures 
are to be considered, since these also require the second-order moments of 
the underlying distribution to be finite. 

Assumption (B2) is strictly equivalent to the assumption that f 1 ^ 2 is dif- 
ferentiable in quadratic mean [see Hallin and Paindaveine (2002a), Propo- 
sition 1]. However, it has the important advantage of involving univariate 
quadratic mean differentiability only. Let L 2 (R^J~,/iz) denote the space of all 
functions that are square-integrable with respect to the Lebesgue measure 
with weight r l on Rq~, that is, the space of measurable functions h :Rq" — ► R 
satisfying f^°[h(r)] 2 r l dr < oo. 

Assumption (B2). The square root f 1 ! 2 of the radial density / is in 
Ty 1,2 (RQ , Hk-i), where W 1 ' 2 (Wq , (ik-i) denotes the subspace of L 2 (Rq , fik-i) 
containing all functions admitting a weak derivative that also belongs to 
L 2 (R+ p fc _i). 
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Denoting by (/ ' ) the weak derivative of / I in L (Rq~, /^fc-i), let ipf := 
— 2 U j. 1 / 2 ' : Assumption (B2) ensures the finiteness of the radial Fisher infor- 
mation 

In the last set of assumptions, we describe the score functions K±, K2 to 
be used when building rank-based statistics in this serial context. 

Assumption (C). The score functions Ki : ]0, 1[— ► R, I = 1,2, are con- 
tinuous differences of two monotone increasing functions, and satisfy Jq 1 [i^(u)] 2 du< 
00 (£=1,2). 

The score functions yielding locally and asymptotically optimal proce- 
dures, as we shall see, are of the form K\ := ip^oF^ and K2 ■= F~ k , for 
some radial density Assumption (C) then takes the form of an assump- 
tion on 

Assumption (C). The radial density /* is such that ip^ is the con- 
tinuous difference of two monotone increasing functions, /Zfc+i-^ < 00 and 
nfAr)?^- 1 Mr) dr< 00. 

The assumption being the difference of two monotone functions, which 
characterizes the functions with bounded variation, is extremely mild. In 
most cases (/* normal, double exponential, . . . ), ipt t itself is monotone in- 
creasing, and, without loss of generality, this will be assumed to hold for 
the proofs. The multivariate t-distributions, however, provide examples of 
nonmonotone score functions (pr satisfying Assumption (C). 

3. Local asymptotic normality. Let A(L) and B(L) be such that A, := 
for i = po + 1, . .. ,pi, and Bj := for i = qo + 1, . . . ,q±, and consider the 
sequences of linear difference operators 

pi 

AW(L) : = I k - J2(A t + n-V2 7 W)L< and 

i=l 

B(«)(I):=I i + f(B i + ^f ) )i , > 
i=l 

where the vector rW := ((vecj^)', (vecyj^)', (vec^)', ■ • ■ , (vectfW)')^ 
R fc is such that sup n (t^/t^ < 00. These operators define a se- 

quence of VARMA models 

AW(L)X t = BW(L) £t , teZ, 
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hence, in the notation of Section 2, the sequence of local alternatives Ti.^ (6q + 
n -VM n ),E,/). 

Let (z[ n \6o), . . . , Z^\0q)) be the n-tuple of residuals computed from the 
initial values E- qo +i, . . . , £q and X^ Q+1 , . . . 5X5 and the observed series 



(X^ , . . . , Xn'' 1 ) via the relation 

t—l PO 

zfVo)=EE H . A . x !- > H 



i=0 j=0 



+ (H 



t+qo-l ' 



Bi 





h 



\B ?0 _i B (?0 _ 2 







hJ 



go+i 



•90+1 ' 



Assumption (A) ensures that neither the (generally unobserved) values (e. 

of the innovation, nor the initial values (X^ Q+1 , . . . , X n ^), have an influence 
on asymptotic results, so that they all safely can be set to zero in the sequel. 
Decompose z[ n \Oo) into 

where (0 O , E) := ||zi n) (0 o )b and uj n) (0o, S) := S-^zf^ojrfVo.S). 
Writing yjj := — 2(D/ 1 / 2 )// 1 / 2 , where D/ 1 / 2 denotes the quadratic mean 

gradient of / 1//2 , define, as in Garel and Hallin (1995), the residual f-cross 
covariance matrix of lag i as 



(5) 



r$(* ) ^(n-i)" 1 E V/(zr(Oo))za(Oc 

t=i+l 



Due to the elliptical structure of /, these cross-covariance matrices take the 
form 



(6) 



1 / n 

\ t=i+l 



Hallin and Paindaveine (2002b) are developing optimal procedures based on 
nonparametric versions of the cross-covariances (5) or (6) for the problem 
of testing elliptic white noise against VARMA dependence. 

Garel and Hallin (1995) established LAN in this setting (in fact, in a 
more general, possibly nonelliptical, context). The quadratic form in their 
second-order approximation of local log-likelihoods (hence, also in ours) is 
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not of full rank, due to the well-known singularity of the information matrix 
for ARMA models. This singularity, quite understandably, has to be taken 
into account in the derivation of locally optimal inference procedures; in 
hypothesis testing, this can be achieved in a straightforward way provided 
the information matrix is factorized in an adequate way (see the comments 
after Proposition 1 for details). This factorization is not provided by Garel 
and Hallin (1995), since they just deal with LAN, and not with optimal 
inference, but it is needed here. Therefore, in Proposition 1, we first restate 
LAN under a slightly different form, in the spirit of the univariate results 
of Hallin and Puri (1994). As usual, though, the multivariate case is a bit 
more intricate, and requires some notational and algebraic preparation. 

Associated with any /c-dimensional linear difference operator of the form 
C(£) := X^o L l (letting Cj = for i > s, this includes, of course, opera- 
tors of finite order s), define for any integers u and v the k 2 u x k 2 v matrices 



/ CoOlfc 
Ci®I ft C ®Ifc 








\C u _i®I fe cv 



and 



/ I fc 0Co 
Ik®Ci Ifc®C 



c (r) 



lit ® C 



v-l 



Ik ® C v - 2 



C u - v ®I k J 



\ 




I fc ®C 



\I fc (g)C u _i I fc (g)C u _2 



respectively; write C« for Cu' u and C„ for Cu/ U . With this notation, 
note that Gtp , G« , and are the inverses of , , B« and 
Bit , respectively. Denoting by and Cu,v the matrices associated with 
the transposed operator C'(L) := J2tLo we also have Gu = (A» 

B«^) -1 , and so on. 
Let 7T := max(pi — po 5 9i — Qo) an d 7i"o := vr + po + <7o> and define the k 2 7TQ x 



rx'(0 
-H-u 



): under Assumption (A) M.q is 



k 2 (pi + qi) matrix M 0Q := (G^ ; Pl : H^ ; 9l 
of full rank. 

Finally, consider the operator D(L) := I k + Ef=i 9 ° D ^ [just as M 0() , 
D(L) and most quantities defined below depend on #oj f° r simplicity, how- 
ever, we are dropping this reference to Gq], where, putting G_i = G_2 = 
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-po+1 



= H 



-l 



H 



H 



■go+l' 




/ G 9o 

G g + 1 

H, 



"90 



T P0+Q0-2 



H 



Po+l 



H 



po-l 



H 



\Hp +g _l Hp 0+go _2 



G -po+l \ 
G -P0+2 



-1 



H 
H 



90 + 1 
90+2 



H r 



/ G 90+i \ 

G 



r P0+90 



H 



Po+l 



/ 



PO+90 



Note that B(L)G' t = for t = q + 1, . . . ,p + q , and D(L)H' t = for t = 
p + l,...,p + q . 

Let , • • • , ^r^ po+,3( ^} be a set of k x matrices forming a fundamental 
system of solutions of the homogeneous linear difference equation associ- 
ated with D(L). Such a system can be obtained, for instance, from Green's 
matrices of the operator D(L) [see, e.g., Hallin (1986)]. Defining 



(1) 

7T+2 



(PO+9o) 

7T+1 

(po+9o) 

7T + 2 



(m > 7r), 



\ \p(l) vp(Po+9o) / 

N m ' ' ' m ' 

the Casorati matrix C\i< associated with D(L) is ^n - Putting 



CJV ^ Q *o H nl 



B n-1 



!fc 2 T 







*n-l 



let 



S^oJ^CCn-lJ^CvecrW^flo))', 
(n-i) 1/a (vBcrW / (Oo)) , I . 
1 /2Tg) / (0o):=Q^ )/ s£ ) / (0o) 



and 
(7) 



J eo , s := lim Q^U^SgS-'jlQj 

n— >+oo u 



.(wr£ w (flb))')', 



(n) 



[convergence in (7) follows from the exponential decrease, as u — > oo, of 
Green's matrices G u and H u ; see the comment after Assumption (A)]. Local 
asymptotic normality, for fixed 5] and /, of the model described in Section 2 
then can be stated in the following way. 
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Proposition 1. Assume that Assumptions (A), (Bl') and (B2) hold. 
Then the logarithm ^^ +n -i/2 1 -c»)/e .s f °f ^ e likelihood ratio ofH^(0o + 
n -1 /M n ),E,/) with respect to W( n )(0 o , / ) ^ aucft toot 

< ) + n- 1 / 2 ^)/« ;E, / ( X(n) ) = (TW/AW (0 ) - KtW/T^ + Op(l), 

as n — ► oo, under TiS n \6o, S, /), with the central sequence 

(8) AW^-n^Mi.P^TWcOo), 

and i/ie asymptotic information matrix r£,/(<?o) := ^S^y^N^e, where 

(9) N 0jS := M' 0o P' 0o J eo , s P 0o M 0o . 

Moreover, A^(0o)> still under (9 q,1, f) , is asymptotically A// c 2( pi+gi )(0, rs,/(^o)) • 
For the proof see the Appendix. 

The benefits of expressing A^ y. and r^j as in (8) and (9) stem from the 
following elementary facts. Sequences of local experiments under LAN con- 
verge, in the Le Cam sense, to Gaussian shift experiments, so that optimal 
tests for the limit Gaussian shifts determine the form of locally asymptoti- 
cally optimal tests for the original problem. Consider the problem of testing 
7io : r = against 7ii : r / in the single-observation £-variate Gaussian 
shift experiment under which A ~ Ni(Tt,T). Let m := rank(r). If m = I, 
the optimal (a-level maximin) test consists in rejecting 7io for large values 
of A'r -1 A, the null distribution of which is xj- Whenever m < £, T is sin- 
gular, and this does not hold anymore. However, if we succeed in writing A 
and r in the form 

(10) A = M' A and r = M'FM, 

o 

where both the m x I matrix M and the mx m matrix T have full rank, the 
problem of testing TCq : r = in the singular ^-variate Gaussian shift exper- 
iment for A ~A/^(rT, r) is strictly the same as that of testing TCq :Mt = 

o 

in the full-rank m-variate Gaussian shift experiment under which A ~ 

o o 

jV m (rMr, T). It follows that the optimal (a-level maximin) test for Tio re- 

o / o , o 

jects the null hypothesis for large values of AT A, which is \m under Hq. 

o / o o 

Now, clearly, AL A = A r A, where Y~ denotes an arbitrary general- 
ized inverse of T, so that, if we succeed in writing A and T in the form (10), 
the somewhat unpleasant recourse to generalized inverses is not required 
anymore. This is exactly what expressions (8) and (9) are allowing for. As a 
consequence, the degeneracy of the information matrix is kind of hidden in 
the explicit forms of the optimal test statistics in Propositions 4, 6 and 7. 
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4. Multivariate ranks and signs: invariance and equivariance. 

4.1. Pseudo-Mahalanobis distances and Tyler residuals. Likelihoods — 
hence, the central sequences (8) — are measurable, jointly, with respect to 
two types of statistics: 

(i) the distances d[ n ^ (6q, S) between standardized residuals S _1 / 2 z| n ' ) (6q) 
and the origin in M. k , and 

(ii) the normalized standardized residuals \j[ n \0Q, S) := X _1 / 2 Z£ (0o)/<^t ($o> 

The (univariate) distances d[ n \6o,'S) are i.i.d. over the positive real line, 
with density (4); their ranks thus have the same distribution- freeness and 
maximal invariance properties as those of the absolute values of any univari- 
ate symmetrically distributed n-tuple. The normed standardized residuals 
XJf (Gq,Y1) under 7i^(6 , S, /) are uniformly distributed over the unit 
sphere, and, hence, can be viewed as multivariate generalizations of signs. 

Unfortunately, both 4 n) (0 o ,E) and U^ n) (0 O ,£) involve, in a crucial way, 
the shape parameter S, which, in practice, is never specified, and has to 
be estimated from the observations. If the actual underlying distribution 
has finite second-order moments [i.e., under Assumption (Bl')], a "natu- 
ral" consistent candidate for estimating S is the empirical covariance ma- 
trix n _1 J2t=i 2t n ^(^o)(Z^(0o))'- Finite second-order moments, however, 
are too strong a requirement, as we would like to build testing procedures 
that are optimal under the assumptions of Proposition 1, but remain valid 
under much milder conditions, including the case of infinite variances. This 
rules out the empirical covariance as an estimate of S and, under the weaker 
Assumption (Bl), which does not require anything about the moments of 
the underlying distribution, we propose to use Tyler's estimator of scatter 
[see Tyler (1987)]. 

This estimator is defined as follows. For any n-tuple := (Z^ n \ Zg , . . . , Zn™^) 
of fc-dimensional vectors, denote by := C^ZW) the [unique for n > k 
(k — 1)] upper triangular k x k matrix with positive diagonal elements and 
a "1" in the upper left corner that satisfies 



(11) 




Tyler's estimator of scatter is defined as X! := (C^'C^) 1 . 



When computed from the n-tuple of residuals Z t (Go), t = 1, . . . , n, Tyler's 
estimator is root-n consistent, up to a multiplicative factor, for the shape ma- 

~ ( n ) 

trix X. More precisely, there exists a positive real a such that y / n(S — aS) 
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is Op(l) as n — > oo under \JjH^ n \9 ,T:, f). Tyler's estimator is clearly 

invariant under permutations of the residuals Z^(0o). Moreover, is 
strictly affine-equivariant, since 

(12) cgCMZW) = dOC^M" 1 

for some orthogonal matrix O and some scalar d that depends on ZW. See 
Randies (2000) for a proof. 

The corresponding distances from the origin 

4 n) (0o,£ ) will be called 
psewdo-Mahalanobis distances, in order to stress the fact that Tyler's esti- 
mator of scatter is used instead of the usual sample covariance matrix. The 

normed standardized residuals W^(0o) '■= u| n ^(0o, ^ ^) — call them Tyler 
residuals — will be used as a multivariate concept of signs. 

4.2. The pseudo-Mahalanobis ranks. As usual in rank-based nonpara- 

metric inference, the pseudo-Mahalanobis distances d[ n \Oo,^ ) will be 
replaced by their ranks. This idea, in the multivariate context, actually 
goes back to Peters and Randies (1990), who (in a one-sample location 
context) proved a consistency result, which in the present situation can 

be stated as follows. Denote by r\ (0o) the rank of d[ n \6Q,H ) among 

4 n) (0o, ± {n) ), • • • , di n) (6>o, S (n) ), and by R^\o Q , S) the rank of d[ n \e , S) 

among d^\e , £), . . . , d!£\Oo, E). 

Lemma 1 [Peters and Randies (1990)]. For all t, i^ n) (0o) - ^"'(fio, E) 

2S Op 

For each £ and n, consider the group of continuous monotone radial 
transformations = {Qg 1 ^}, acting on (M fc ) n , characterized by 

gW(zS" } (0o),...,Z(")(0o)) 

:=( 5 (d^ ) (0 o> E))E 1 / 2 uS n) (0 o ,E),..., 5 (dW(0o,E))E 1 / 2 uW 

where g : K + — ► M + is continuous, monotone increasing, and such that g(0) = 
and linv^oo g(r) = oo. The group Gj^ 1 is a generating group for the sub- 
model U/ W (n) (0 O , £, /), where the union is taken with respect to the set of 
all possible nonvanishing radial densities. The ranks (0o, £) , t = 1, . . . , n, 
are a maximal invariant for . Lemma 1 thus is an indication that statis- 
tics based on the pseudo-Mahalanobis ranks R^ (Oo) may be asymptotically 
invariant, in the sense of being asymptotically equivalent to their counter- 
parts based on the unobservable, strictly invariant ranks R[ n \Oo, £). This 
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will indeed be the case with the test statistics we are proposing (see Propo- 
sitions 2 and 4). 

Note also that the equivariance property (12) of C^yj under affine trans- 
formations is sufficient to make the pseudo-Mahalanobis ranks R[ n \Oo) 
strictly affine-invariant. 

4.3. Tyler residuals. The transformation C!jyj characterized in (11) ac- 
tually sphericizes the problem, in the sense that it transforms elliptically dis- 
tributed residuals into spherically distributed ones, estimating Jj[ (6q,Y1) 

by means of the Tyler residuals wj n) := w£°(0o) := C^Z^ (6> )/|| C^Z^ (6> ) || , 
with the following consistency property. 

Lemma 2. Under \J f W (n) (0 O , S, /), maxi<t< n {|| wj n) (0 O ) - uj n) (0 O , S) ||} = 
Op(n~ 1 / 2 ) asn— >oo. 

For the proof see the Appendix. 

It is clear from (11) that C^aiZf , . . . ,a n Z { n ] ) = c££(z£ n) , . . . , zl n) ) 

(n) 

for any real numbers a%, . . . , a n , so that C^. yl and, therefore, the Tyler resid- 

(n) 

uals W[ themselves, are strictly invariant under radial monotone trans- 
formations. Incidently, it readily follows from (12) that the Tyler residuals 
enjoy the following strict equivariance property: 

Lemma 3. Denote by (M) the Tyler residuals computed from the 
transformed residuals M(zj n) , . . . , Z^ n) ). Then (M) = Ow| n) , where 
O is the orthogonal matrix in (12). 

For the proof see the Appendix. 

Note that Lemma 3 implies that any orthogonally invariant function of 
the Tyler residuals is strictly affine-invariant. In particular, statistics that 
are measurable with respect to the cosines of the Euclidean angles be- 
tween the s — that is, measurable with respect to the scalar products 

(w| n ^'W^) — turn out to be affine-invariant. This shows that the Tyler 
residuals could be used with the same success (consistency, invariance prop- 
erties) as Randies' interdirections in the construction of the locally asymp- 
totically optimal affine-invariant tests for randomness proposed in Hallin 
and Paindaveine (2002b). This "angle-based" approach (as opposed to the 
"interdirection" -based one adopted there) is discussed, for the one-sample 
location problem, in Hallin and Paindaveine (2002c). 

For k = 1, the Tyler residuals and pseudo-Mahalanobis ranks reduce to 
the signs and the ranks of absolute values of the residuals, respectively. 
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The statistics we are considering in Sections 5 and 6 thus are multivariate 
generalizations of the serial signed-rank statistics considered in Hallin and 
Puri (1991). 



5. Rank-based cross-covariance matrices. The rank-based versions of 
the cross-covariance matrices (6) we are proposing are of the form 



rgk*o) := c 



(13) 



{n)l 

Tyl 



1 



n 



t=i+l 



n + l 



xW{ n) (0o)wg / (0 o ))(cg)-i ) 



n + l 



where K\,K% :]0, 1[— > R are two score functions as in Assumption (C); 
call (13) a ET-cross-covariance matrix. Let us shortly review some exam- 
ples of score functions extending those which are classically considered in 
univariate rank-based inference. The simplest scores are the constant ones 
(Ki(u) = K2(u) = 1), and yield multivariate sign cross-covariance matrices 



'Tyl | 



£ W fVo)Wn>o) ] (C^)-\ 



{n)l. 



,(nK-l 



n — i 



t=i+l 



leading to serial versions of Randies' multivariate sign test statistic [Randies 
(2000)]. Linear scores (Ki(u) = K2(u) = u) yield cross-covariance matrices 
of the Spearman (or Wilcoxon, as only the ranks themselves are involved) 
type, 



(14) 



'Tyl I 



1 



- 2 £ Rl n) (0 )Rt\(0 



(n — i) (n + 1) 



t=i+l 



xWf)(0 o )Wn(0o)](C^ 



{n)l. 



i(n)'y-l 



The score functions allowing for local asymptotic optimality under radial 
density /* are K\ := ip^oF^} and if 2 = F~ k (see Proposition 4). The most 
familiar example is that of the van der Waerden scores, associated with 
normal radial densities {f*{r) := <p(r) = exp(— r 2 /2)), yielding the van der 
Waerden cross-covariance matrices 



(15) 



'Tyl I 



1 



n 



E 

t=i+l 



\ 



-1 



Rj n \0o) 
n + l 



\ 



-1 



n + l 




(Cg) 
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where Vf^ stands for the chi-square distribution function with k degrees 
of freedom. The Laplace scores, associated with double-exponential radial 
densities (/*(r) :=exp(— r)), are another classical example. 

In order to study the asymptotic behavior of the /f-cross-covariance ma- 
trices (13) associated with general score functions, under the sequence of 
null hypotheses as well as under sequences of local alternatives, we first es- 
tablish the following asymptotic representation and joint normality results; 
see the Appendix for the proofs. 



Proposition 2. Let Assumptions (Bl) and (C) hold. Then writing 
r i^ ; s,/( o) 

(1 n 
-±- £^(4 n) (0o,s))) 
n 1 t=i+i 

x K 2 (F k (dt\(Oo, E)))uW(fl , E)u£2(*o, E)) E' 1 / 2 , 

vec(fiJ(fl )-r i ( !2-. Si/ (fl )) is op(n- x / 2 ) under ft( n )(0 o , E, /) asn^oo. 
For the proof see the Appendix. 

For any square-integrable score function K defined over ]0, 1[, let E[if 2 ([/)] := 
K 2 (u) du, D k (K; f) := K(u) F^ x {u) du, and C k (K; f) := $ K \u)<p / oF^ 1 (u) du. 
Then we have the following: 



Proposition 3. Let Assumptions (A), (Bl'), (B2) and (C) hold. For 
any integer m, the vector 

n Sl^SJ^o) := ((„ - l)V2 (vecr W ;S/ (0 O ))', . . . , 

1 j (n-m)^(vecrW f!lli/ (flo)) , ) / 

«s asymptotically normal under H.( n \0o,Jj, f) andunder7i.( n \6o + n~ 1 / 2 T,'£,f), 
with mean under W n \9o, E, /) and mean 

^DkiKxfiCkiKrJ)^ ® (E ® E- 1 )]Q2£ +1) P eo M eo T 

under H.( n \Oo + n _1 / 2 r, E, /), and wii/i covariance matrix 

^E[Kf(U)]E[Kl(U)][I m ® (E (g) S" 1 )] 

under both. 
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For the proof see the Appendix. 

In order to compare Proposition 3 and the corresponding univariate re- 
sults in Hallin and Puri (1991, 1994), note that 

Q(7' +1) P 0o M 0o rM = ((aW + bi">)', • • • , (aL n) + b^)')', 

with 

min(pi,i) i—j min(qo,i— j— k) 

af } := E E E (G^-^B^H'JvecTf 

j=l k=Q 1=0 

and 

min(qi ,i) 

bf:= E (I fc ®H 4 _,)vec4 n) . 

3=1 

Propositions 2 and 3 show that K-cross-covariance matrices, while based 
on multivariate generalizations of signs and ranks, enjoy the same intuitive 
interpretation and inferential properties as their (traditional) parametric 
Gaussian counterparts Tr^AOo). Proposition 3, for instance, immediately 
allows for constructing non-Gaussian portmanteau test statistics and deriv- 
ing their local powers. Just as their classical versions (based on the classical 
r^^'s), such portmanteau tests, however, fail to exploit the information 
available on the serial dependence structure of the observations, hence, are 
not optimal. Section 6 is devoted to the construction of locally asymptoti- 
cally optimal tests based on X-cross-covariances. 

6. Optimal tests. We are now ready to state the main results of this 
paper: the optimal testing procedures for the problem under study, their in- 
variance and distribution-freeness features, as well as their local powers and 
optimality properties. Optimality here means local asymptotic minimaxity, 
either based on fixed-score test statistics, at some selected radial density 
or, based on estimated scores, uniformly over some class T of densities. 

6.1. Fixed-score test statistics. Letting 

sW(fl ) :=((«- lMvecf^o))',..., 

(n - ^(vecf $(*<,))', • ■ • , (vecf £!i ; jr(0o))')'> 

define 

n^f^(e ):=Q^\e ) and 
(17) J^-Q^Pn-i®^®^" 1 )^, 
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~ ( n ) 

where S denotes Tyler's estimator of scatter (see Section 4.1). Finally, let 

The test statistics Q^(0q) allowing for local asymptotic optimality under 
radial density /* are obtained with the score functions K\ := <Pf*°F*k and 
K2 '■= F~ k . We then have the following proposition. 

Proposition 4. Assume that Assumptions (A), (Bl), (B2) and (C) 
hold. Consider the sequence of rank tests 0^ (resp. 4>^) that reject the 
null hypothesis HS n \Oo) whenever Qk (Oo) [resp. Q^(0q)] exceeds the a- 
upper quantile x\^ % i_ a °f a chi-square distribution with k 2 iro degrees of 
freedom, where ttq is defined in Section 3. Then: 

(i) the test statistics Q^\@o) do not depend on the particular choice of 
the fundamental system {^l^, . . . , ^^ po+qo ^ (see Section 3); for given values 
ofpo and qo, they depend on p\ and q\ only through ir = max(pi —po,qi — qo); 

(ii) (60) is asymptotically invariant with respect to the group of con- 
tinuous monotone radial transformations; 

(hi) (60) is asymptotically chi-square with k 2/ KQ degrees of freedom 

under Ti.^ (6q ) (so that fix has asymptotic level a), and 

(iv) Qk (Qo) is asymptotically noncentral chi-square, still with k 2 TTo de- 
grees of freedom, but with noncentrality parameter 

1 Dl(K 2 -f)Cl(K x -f). 



under ft( n )(0 + n 1 / 2 r,I],/) ; provided, however, that (Bl) is reinforced 
into (Bl') ; where Ng j; is defined in (9); 

(v) if we assume that /* satifies Assumptions (Bl'), (B2) and (C), the 
sequence of tests 0^ is locally asymptotically maximin for 7i^ n \6o) against 
Ue^e Us^ (Oj^->f*)> at probability level a. 

For the proof see the Appendix. 

Again, there is no reason to expect the test statistic to be affine-invariant, 
since the testing problem itself, in general, is not; see Hallin and Paindaveine 
(2003). Nevertheless, the following proposition establishes that whenever the 
testing problem under study is affine-invariant (e.g., the problem of testing 

randomness against VARMA dependence), then the test statistics Q^'(Gq) 
also are affine-invariant. 
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Proposition 5. (i) The null hypothesis TC^^Oq) is invariant under 
affine transformations if and only if 6q is such that Aj = ajlfc for all i = 
1, . . . ,p and Bj = ftjlfc for all j = 1, . . . , g - 

(ii) When the null hypothesis TL^ n \6o) is affine-invariant, then Q% (Oq) 
also is. 



For the proof see the Appendix. 



(n) 

6.2. Estimated- score test statistics. The tests 4>jJ considered in Propo- 
sition 4 achieve parametric efficiency at radial density ARMA models, 
though, under adequate assumptions, are adaptive; this has been shown 
formally in the univariate case only [without even requiring symmetric in- 
novation densities; see, e.g., Drost, Klaassen and Werker (1997)], but is very 
likely to hold also in higher dimensions. Adaptive optimality property — that 
is, parametric optimality at all /* — thus can be expected, provided that es- 
timated scores are considered. Proposition 6 shows that this, indeed, is the 
case. 

An adaptive procedure could be based on the score function ip j associated 

with an adequate estimator / of the radial density. While being uniformly 
locally asymptotically maximin, such a procedure, however, would not have 
the very desirable properties of rank-based procedures. This is why we rather 
propose, in the spirit of Hallin and Werker (2003), an adaptive version of 
the rank-based procedures described in Proposition 4. 

(n) 

Let us first assume that S is known, so that the genuine distances d t := 
d[ n \6Q,Yl) can be computed from the observations. Denote by R[ n ^ := 
i2j n) (0o,£) the rank of d^ among df\ . . . ,d£ ) : under H {n \0 , E, /) the 

(n) 

Rl 's are the ranks of i.i.d. random variables with probability density func- 
tion /fc . Next consider any continuous kernel density estimator of that 

in) 

is measurable with respect to the order statistic of the d t 's and satisfies 



E 

(18) 



n + l 

dW xx / >) X n 2 



n + lj J ft V^ + l 

rpi n ) 



Jk 



op(1). 



under ft( n )(0 o ,E,/) as n — > oo, where F^ 1 denotes the cumulative distri- 

-(n) 

bution function associated with ff, . 

A possible choice for satisfying (18) is given in Hajek and Sidak 
[(1967), (1.5.7) of Chapter VII]. Another one, specifically constructed for 
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radial densities, is proposed in Liebscher (2005). An adaptive (still, under 
specified S) version of (13) is then 

(n) 




x(^»)-'(^ 



where we let (£j (r) := (r) + (k — l)/r [since 9?/(r) = ip^ {r) + {k — l)/r\. 

Of course, in practice S is not known, and only the estimated distances 

d[ n ^ := (if^ (0o,5^ ^) can be computed: instead of f^(0o) given in (19), we 
therefore rather use (with the notation of Section 4) 



t=i+l 



(20) x^r 1 

V h ' \n + l 
xwW(0 o )wg'(0 o )j(C^)- 1 , 

where , and tp^ have been replaced by their counterparts 

F^ and tp^ computed from the order statistic of the <4 s. Using the 
multivariate Slutsky theorem and working as in the proof of Proposition 2, 
we obtain that the difference between (19) and (20) is op(ra -1 / 2 ) under 
\JfTi.^ n \0o, S, /) as n — > oo. A direct adaptation of the proof of Proposi- 
tion 3.4 in Hallin and Werker (2003) then yields a multivariate generalization 
of the (symmetric version of) Proposition 6.4 in Hallin and Werker (1999). 
This adaptation, however, requires the Fisher information for location asso- 
ciated with /fc to be finite. Denote by T the set of all radial densities / for 
which this condition is satisfied: clearly, {f\Xf.f < oo and r k ~ 3 f(r) dr < 
oo} c T and, in the univariate case {k = 1), T = {f\I\ f < oo}. 

Lemma 4. Let Assumptions (Bl) and (B2) hold, and assume that f £ T 
satisfies Assumption (C). Then both vec(f • n ^(#o) — j(#o)) andvec(T • \6o) 
r i%,f( d o)) are o v (nr x l 2 ) under «W '(0o,£,/) asn^oo. 

In order to construct adaptive procedures, we still need to estimate the 
asymptotic variance-covariance matrices of either (19) or (20). More pre- 
cisely, we need consistent estimates of Ikj and Vkj ■= Mft+i;/ 1 'fi-k-i-J = 
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E[(_FVT (U)) 2 ]. Such estimates are provided by 



and 



and 




respectively. Note that the product X^v^ (resp. X^v^) depends on the 

estimated radial density (resp. ) only through its density type — 

namely, the scale family {afj t n \ar), a > 0} [resp. {afj { n \ar),a > 0}]. By 
the way, the same property holds true for the adaptive rank-based cross- 
covariances (19) and (20); consequently, without any loss of generality, we 

may assume that and are such that = = 1. 
Defining 

SW(*„) := S&Vo) := ((n - l) 1/2 (vecf ^(0 O ))', ■ ■ ■ , (vecf^Cflo))')' 
and 

n l/2 t (n) (0o) :=n V2T|) (0o) :=Q W'SW(0 O ), 

let 

(21) Q^o^Q^o)^^!^ 

where J^™^ is defined in (17). The same quantities, when computed from 

the f - n) (0o)'s, are denoted by S( n )(0 o ) and T^ n \e ), respectively, yielding 
the test statistic 

«""«>»> : = ilkk * , " l '(«o) ( j« s )-'f (»)<»„) =er< s »)- 

The test statistic (21) has the very desirable property of being condition- 
ally distribution-free. Conditional upon the cr-algebra T>^ generated by the 
order statistic of the exact distances d( n ) := (<4 , . . . , d^), indeed: 
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(a) the vector of ranks R,( n ) := (R™ ,...,Rn ) is uniformly distributed 
over the n! permutations of (1, . . . , n), 

(b) the normalized residuals are i.i.d. and uniformly distributed over 
the unit hypersphere, and 

(c) the ranks R/ n ) and the residuals u| are mutually independent. 

The situation is thus entirely parallel to the classical case of univariate 
signed ranks: conditional on T>( n \ Q^ n \Q$) is distribution-free. Denote by 
q a (d^) its upper a-quantile, and by := (f)^ the indicator of the event 

Q( n \Oo) > <7a(d/^ ). This test actually has Neyman a-structure with respect 

(n) 

to dW and, consequently, is a permutation test. Proposition 6 and Lemma 4, 

moreover, imply that the sequence is asymptotically optimal, uniformly 
in/, against U/W (n) («o,S,/). 

Unfortunately, unlike univariate adaptive signed rank tests, this permuta- 
tion test cannot be implemented, since S, in practice, is unspecified. Instead 
of based on Q( n \0 ), we therefore recommend $( n ) = <jfa , based on the 
test statistic Q^(#o)> which rejects the null hypothesis 7~^ n )(#o) whenever 
Q^ n \Oo) exceeds the a-upper quantile xt^n i- Q °f a chi-square distribution 
with k 2 TTo degrees of freedom. In view of Lemma 4, Q (n) (0o) andQW(0 o ) arc 
asymptotically equivalent under U/^ (^0, /) and contiguous alterna- 
tives: <p n ) and thus share the same asymptotic optimality properties. On 
the other hand, (jy^ loses the attractive finite-sample Neyman a-structure 
of 

Summing up, the following proposition is a direct consequence of Lemma 4. 

Proposition 6. Let Assumptions (A), (Bl') and (B2) hold, and as- 
sume that f £ T satisfies Assumption (C). Then: 

(i) statements (i)-(iii) of Proposition 4 hold for <f>( n > ; statement (iv) also 
holds, with asymptotic noncentrality parameter 

lE[(F fe - 1 (C7)) 2 ]E[( ¥ , / (F fc - 1 (^))) 2 ]T / N eo , S r 

under H^(6 + n~ l / 2 T, S, /); 

(ii) the sequence of tests is locally asymptotically maximin for "HS n '{0 o) 
against Ue^e Us U/ T~i^{6, S, /), ai probability level a, where the third 
union is taken over all radial densities / £ J satisfying Assumptions (Bl ; ), 
(B2) and (C). 

Proposition 5 readily extends to this adaptive procedure. 
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6.3. The Gaussian procedure. We now briefly describe the parametric 
Gaussian procedure for the problem treated in Propositions 4 and 6. This 
Gaussian test will serve as a benchmark in Section 7 for the computation of 
asymptotic relative efficiencies. 

Under Gaussian assumptions the empirical covariance S( n ) := n~ l x Y^t=i Z$ (0o)z| '(^o) 
is a consistent estimator under H.( n \6o,Y}, f) of the innovation covariance 
(E[(F^(U)) 2 ]/k)V.Let 

where f 1 ^ :=(n-l)" 1 Er= 2 vec(z(" ) (0o)zS / (0o))(vec(z(" ) (0 o )zH / (0 o ))) / . 
In view of the ergodic theorem [see Hannan (1970), Theorem 2, page 203], 

is consistent under ft( n )(0 o , £, /) for (E[(F A T 1 ([/)) 2 ]/A;) 2 S (g) E _1 . The 
following proposition then follows along the same lines as Proposition 4. 

Proposition 7. £e£ Assumptions (A), (Bl') and (B2) ZioW. Define 
(22) <$>(*„) i^TW'tflo)^)" 1 !^). 

(n) 

Consider the sequence of parametric Gaussian tests (jy^ rejecting the null hy- 
pothesis Ti( n \6 ) whenever Qffi(0 o ) exceeds the a-upper quantile x _ a 
of a chi-square distribution with k 2 7tQ degrees of freedom. Then: 

(i) statements (i) and (hi) in Proposition 4 hold for <fffl ; statement (iv) 

also holds, with asymptotic noncentrality parameter (E 2 [Fj~ (U)(pf(Fj~ (U))]/k 2 ) x 
T'N floiS r underH( n \6 + n^ 2 r, E, /); 

(ii) £/ie sequence of tests 4$ is locally asymptotically maximin for Ti^ (6q) 
against the Gaussian alternative U s 7"^ n )(#o?E, , at probability level a. 

The test statistic Q$(0q) is not (not even asymptotically) invariant un- 
der continuous monotone radial transformations. However, it is asymptoti- 
cally distribution-free. On the other hand, Q^{0q), just like Q$(0q) and 
Q^ u \0q), is affine-invariant whenever the null hypothesis is. 

7. Asymptotic performance. 

7.1. Asymptotic relative efficiencies. Computing the ratios of the non- 
centrality parameters in the asymptotic distributions of Qg\ Qf^ and 

with respect to Q$ (see Propositions 4, 6 and 7) yields the asymptotic 
relative efficiencies of these tests with respect to their parametric Gaussian 
counterparts. 
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Proposition 8. Let Assumptions (A), (Bl'), (B2) and (C) hold. Then: 

(i) the asymptotic relative efficiency under radial density f of (p^ with 
respect to 4>ffi is 

ARE feJ (^ ) - # E[K 2 {U)] E[K 2 {U) y 

(ii) assuming that (C') Zio/ds instead of (C), 

ABE fcf/ (* A ) - F Dfc(A) Cfc(A) . 

w/iere we write £>fc(/i, / 2 ) and C fe (/i, / 2 ) for D k (F~ k l ; f 2 ) and C^iff^F^ 1 ; f 2 ), 
respectively, and let C k (f+) := C fc (/*, /*) and -D fc (/*) := /*); 

(iii) assuming, moreover, that f £ J- satisfies Assumption (C), i/ie asymp- 
totic relative efficiency of the adaptive test c/)^ with respect to under 
the radial density f is 



k 2 ' 

The AREs for the fixed-score procedures obtained in Proposition 8 coin- 
cide with those obtained in Hallin and Paindaveine (2002b) for the related 
problem of testing randomness against VARMA dependence. The numeri- 
cal values of AREs of several versions of the proposed procedures (van der 
Waerden and Laplace score tests, sign test, Spearman-type test) with respect 
to the Gaussian procedure, under multivariate t-distributions with various 
degrees of freedom, are reported there. As usual in rank-based inference, 
the gain of efficiency over parametric 1? procedures increases with the tail 
weight [see Hallin and Paindaveine (2002b)]. 

In this section, we thus concentrate on the adaptive procedure described in 
Proposition 6. As in Randies (1989), consider the family of power-exponential 
distributions with density 

(23) ^(x) = K Kv — 1 exp[-((x - 0) / S' 1 (x - 0) /<%)"], v > 0, 
with 

kT(k/2u) , „ vY{k/2) 

C : = W/7 r\\ / r\ \ K k,U - 



T{{k + 2)/2u) F(k/2u){nco) k / 2 ' 

This family corresponds to radial densities of the form f u (r) := exp[— (r 2 /cq) v \, 
and allows for considering a variety of tail weights indexed by v. The k- 
variate normal case corresponds to v = 1, while, for < v < 1 (resp. v > 1), 
the tails are heavier (resp. lighter) than in the normal case. 
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Table 1 

Asymptotic relative efficiencies of the adaptive test <jy- n ' w.r.t. the 
Gaussian test (f>j^ in the elliptically symmetric power-exponential 
family (23), for various values of the tail index v and the space 
dimension k 



v 



k 


0.1 


0.2 


0.3 


0.5 


1 


2 


5 


10 


1 






28.40 


2.00 


1.00 


1.37 


3.18 


6.43 


3 


261.24 


8.08 


2.77 


1.33 


1.00 


1.22 


2.30 


4.26 


4 


59.63 


4.77 


2.16 


1.25 


1.00 


1.18 


2.08 


3.71 


6 


14.81 


2.84 


1.69 


1.17 


1.00 


1.13 


1.81 


3.03 


8 


7.51 


2.19 


1.48 


1.13 


1.00 


1.10 


1.65 


2.63 


10 


5.02 


1.88 


1.37 


1.10 


1.00 


1.09 


1.54 


2.36 


oo 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 



Provided that 4V + k — 2 > 0, Proposition 8 yields 
(oa\ adt? (X(n)/Anh 4^ r((fc + 2)/2^)r((4^ + fc-2)/2^) 

Table 1 above provides some numerical values of (24). 

7.2. A multivariate version of two classical univariate results. Since the 
AREs obtained in Proposition 8 for the fixed-score procedures 4$ and 4>^ 
coincide with those in Hallin and Paindaveine (2002b), the generalizations 
obtained there of the famous Chernoff-Savage and Hodges-Lehmann results 
still hold here. In view of their importance, we adapt these results to the 
present context, referring to Hallin and Paindaveine (2002b) for proofs and 
details. 

A multivariate serial Chernoff-Savage result. As in the univariate case, 
the van der Waerden version of the proposed rank-based procedure is uni- 
formly more efficient than the corresponding parametric Gaussian proce- 
dure. More precisely, the following generalization of the results of Chernoff 
and Savage (1958) and Hallin (1994) holds. 

Proposition 9. Let Assumption (A) hold. Denote by 4\<w an< ^ 
the van der Waerden test, based on the cross- covariance matrices (15), and 
the Gaussian test based on the test statistic (22), respectively. For any f 
satisfying Assumptions (Bl') and (B2), 

ABE k>f (^/^)>l, 
where equality holds if and only if f is normal. 
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Table 2 

Some numerical values, for various values k of the space 
dimension, of the lower bound for the asymptotic relative efficiency 
of the Spearman test 0gp' with respect to the Gaussian one (jyff 



k 


inf.ARE^^V^) 


fc 


inf, ARE^I^pV^') 


1 


0.856 


5 


0.818 


2 


0.913 


6 


0.797 


3 


0.878 


10 


0.742 


4 


0.845 


+oo 


0.563 



A multivariate serial Hodges-Lehmann result. Denote by QgP (^°) ^ ne 
Spearman- type version of the test statistics Q£ (Oo)> based on the cross- 
covariances (14) associated with linear scores. This statistic can be con- 
sidered as the angle-based serial version of Peters and Randies' Wilcoxon- 
type test statistic [see Hallin and Paindaveine (2002c) and Peters and Ran- 
dies (1990)]. 

Although the resulting test <^p is never optimal [there is no /* such 

that Q^\0q) coincides with Qsp(do)], the resulting Spearman-type proce- 
dure exhibits excellent asymptotic efficiency properties, especially for rela- 
tively small dimensions k. To show this, we extend Hodges and Lehmann's 
(1956) celebrated "0.864 result" by computing, for any dimension k, the 

(n) 

lower bound for the asymptotic relative efficiency of 0g P with respect to the 

(n) 

Gaussian procedure (j>\f ■ More precisely, we have the following proposition 
[see Hallin and Paindaveine (2002b) for the proof]. 

Proposition 10. Let Assumption (A) hold. Define 

c k :=inf{x>0|(v / x«/ v ^T/2( ;r )) / = }' 

where J r denotes the Bessel function of the first kind of order r. The lower 
bound for the asymptotic relative efficiency of <pgp with respect to cf)ffl is 

(25) inf ARE fc)/ (4 n p ) /^ ) ) = 9(2cg + k - l) 4 /2 1C W fc , 

where the infimum is taken over all radial densities f satisfying Assumptions 
(Bl ; ) and (B2). 

Some numerical values are given in Table 2. Note that the sequence of 
lower bounds (25) is monotonically decreasing in k for k>2, and tends to 
9/16 = 0.5625 as k^oo. 
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APPENDIX 

A.l. Proofs of Proposition 1 and Lemmas 2 and 3. 

Proof of Proposition 1. Garel and Hallin (1995) show that the lin- 
ear part in the quadratic approximation of ^^ +n -i/2 T (n) /e -s / can ^ e wr ^" 
ten as 

n-l 



' = E (« - *) V2 tr[di n), (»o)rg, / (»o)], 
1=1 

where 



min(pi,i) i—j min(qo,i—j—k) min(gi,i) 

4 l \e )-.= E E E ^6^,+ E ^y : . 

3=1 k=0 1=0 j=l 

Using tr(AB) = (vec A')'(vecB) and vec(ABC) = (C'ig) A)vecB yields 

/aS n) +bW\' 

(26) ^(n-i) 1 /2 tr [dW'(0 o )rS i/ (0 o )]= : s£> ), 

i=l V (n) , b (n) / 

with a| n ^ and defined at the end of Section 5. 
Since Bm Hm^i = Hm,gi, (26) can be written as 

tWaM (0 o ) = [(H&B&gJ? ^IH^x )rW]'sW (Oo) 
= [(Gf liP jHf lig JrW]'(H« 1 B2 1 )'sg / (0 o ) 

(27) /ai n) +bS n) \' 

where : = £^^(6^ 8> I*)(vec 7 f ) and b| n) := Ef^H^t-j 8 
I fe )(vec4 n) ); the sequences (aj n ^) and (b^ ) clearly satisfy 

(28) (if > + bf* . . . , 4? + bg>)' = M^tW . 
Note that, for i > po + q^ + 1, 

/ po \ pa 

B(L)G' t = D(L) £ G't-i K = E( D ( L ) G *-) A i = 

V i=l / i=l 

Therefore, D(L)G^ = for all t > qo + 1. In the same way, we obtain that 
D(L)Hj = for t >po + l- Now, consider the A; 2 -dimensional operator D^(L) := 
Ifc 2 + 2?=i 9 °(Di 8) Ifc)-^ 4 - This operator is such that, for t — p\ > qo + 1, 
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D^(L)af = Ejii(D(^)G[_ i <8> I fc )(vec7j n) ) = 0. Similarly, one can check 
that for t - qi > Po + 1, DW(L)bi n) = ^|^ 1 (D(L)Hj_ j 8) I fe )(vec5f } ) = 0. 
This implies that aj n) + b[ n) satisfies DW(L)(aj n) + b t (n) ) = for all t > 



max(p! + q + l,qi+Po + l)=TT+(p + qo) + l- Since { *I 1} ® I fe , . . . , *J 
Ifc} is a fundamental system of solutions of the homogeneous difference equa- 
tion associated with D^(L), we have 



,(po+<?o) , 



/s( n ) + b w 

/ a 7T+l + D 7T + 1 



CO 



(29) 



a TT+l + D 7T + 1 



\a (n) 

Xd n-1 



r 1 - 1 

n— l^vj< 



+ b (n) 



s(») 



+ b 



(n) 



[see, e.g., Hallin (1986)]. Combining (28) and (29), we obtain 



l i a 1 -t- d x , . . . ,a n _ 1 -\- D n _ 1 ) 



Ifc 2 7 









-1 



which, together with (27), establishes the result. □ 



Mg Q T 



(») 



PROOF of Lemma 2. Under U/ W (ri) ( o, S, /), the residuals Zi(0 o ), . . . , 
from which C^yj is computed, are i.i.d. and elliptically symmetric, with mean 
and scatter matrix E. Tyler (1987) showed that C^ then is root-re con- 
sistent for Co := c -1 5] _1 / 2 , where c denotes the upper left element in E^ 1 / 2 . 
The result follows, since for any random vector X, 



ipWv 
I^Tyl^l 



< 



lis-v^xii 
1 1 



\n( n ) Yii 
nr 1 ^) v 

HWyl^ 



|C X| 



|cgx|| + 



ICnXl 



I^Tyl^ 



CnXl 



< 2 

where ||T||£ : = 
matrix T. □ 



CnXl 



|C X|| 
:sup{||Tx| 



< 2- 



i(n) 
'Tyl 



CollJlXl 



ICnXl 



< 2IIC 



(n) 
Tyl 



CoLllc 



o \\ci 



|x|| = 1} denotes the operator norm of the square 



PROOF of Lemma 3. The definition of wj n ^(M) and (12) directly yield 
C^ y ) 1 (M)MZ^ n) 



W^ n) (M) 



cf y ) 1 (M)MZ 



(n)| 



doc$z< w) 
Idocgz^i 



ow 



CO 



□ 
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A. 2. Proofs of Propositions 2 and 3. The following lemma, which follows 
along the same lines as Lemma 4 in Hallin and Paindaveine (2002b), will be 
used in the proof of Proposition 2. 

Lemma 5. Let t£{l,...,n-l} and t, t 6 {i + 1, . . . , n} be such that 
t^t. Assume that g : ]R nfc = R t x--xl' ! -»R is even in all its arguments, 
and such that the expectation below exists. Then, under |L 7i.( n \Oo, S, /), 

Eb(z! n) (0 o ), • • • , zW(0o))(P;Q t ~)(R^S t ^)] = 0, 
where Pj , Qj , Rj and Sj are any four statistics among 

Proof of Proposition 2. Throughout, we write <4 , ^4 , -ftf^, 
W 4 (n) and Uj n) for (0 O , E) , ^ (flo.S), (*o) , wj n) (0 O ) and uj n) (0 O , S) 
respectively; all convergences and mathematical expectations are taken as 
?i ^oo, under 7i^ n \9o, E, /). Decompose 

(n - i) 1/2 [(cg 55 (eg)" 1 ) vecf - S 'i/ 2) vec rg ;SJ (0 o )] 

into vec(T^ n) + } + T^ n) ), where 

n , jj(n) x , r>(n) 

t=i+l v"--^ 1 / vw-i-i- 



and 



t=i+l 



We proceed by proving that vecT^ , vecT;, and vecTg all converge to 
in quadratic mean, as n — ► oo. Slutsky's classical argument then concludes 
the proof. 
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Let us start with T3 . Using the fact that (vec A)'(vecB) = tr(A'B) and 
the independence between the d^s and the Uj's, we obtain 



IvecT 



(n)|,2 
\L 2 



E( 

t=i+l 



>h2 
t;i ) 



E 



A". 



R 



(n) 



n+l 



A, 



i? 



t-i 



n + 1 



(n). 



where c 



(n) 



j t-i — ( n — i) for all t = i + 1, . . . , n. Hajek's projection result thus 

implies that || vec Tg"" 1 1|^ 2 = o(l) as n — ► oo. The same result also implies 
that, for all i = i + 1, . . . , ra, 



(30) E 



For T 



R 



(n) 



n + 1 



)-G§) 



ifi(F fc (dS n )))lf a (F fc (dn)) 



(n). 



2i 



O(l). 



2 n \ decomposing WJ'^WJ^ 



r( n )w( n )' 



-U^uKinto(Wr 



r(") 



Tjf^ (W^ — Uj™^)', then using the identity (vec A)'(vec B) = tr(A'B) again 



r(") 



and Lemma 5, one obtains 



(31) 



|vecT 2 >\\ L 2 
< 2(n- 



+ 2(n 



o- 1 E e 

t=i+l 
n 

t=t+l 



Al 



i? 



(n) 



n + 1 



R 



(n) 
t-i 



Al 



(n) 



n+l 



n + 1 
A 2 



|W 



(n) 



(«)| 



K t-i 



n + l 



u 



(n)| 
t-i\ 



Consider the first term in the right-hand side of (31) (the second term can 



be dealt with in the same way). Let A$ := Ki(R^ 1 ' j(n + l))K 2 {R^J{n + 
1)) and B$ := K 1 {F k (d ( t n) ))K 2 (F k (d[^\)). Using (30) and the independence 
between the d[ n ^s and the 's, we obtain 



E[(4?) 2 ||WW 



U 



(")| 



E[(B 



t:i 



u| n) || 2 ] + o(l) 



ll^i(c/)lli 2 ||^ 2 (c/)lli 2 ||w t 



(n) 



U 



(n) N 2 
\L 2 



+ o(l), 



where U is uniformly distributed over ]0, 1[. Lemma 2 thus implies that 

vecTg converges to in quadratic mean. 
Finally, using Lemma 5 again, 



vecT^ || L 2 = {n ■ 



1 E 

t=i+l 



A, 



bin) 



R 



n + l 



A, 



R 



t-i 



-1/ 



Ai 



a! w) 

n + l 



n + l 
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This entails that vecTi also is o qm (l), provided that 



(32) K x 



R 



(n) 



n+l 



R 



(n) 



n + l 



R 



(n) 



n+l 



R 



n + l 



>0 



as n 



oo. 



Now, Lemma 1 establishes the same convergence as in (32), but in probabil- 
ity. On the other hand, it follows from (30) that [K 1 (R t /(n + l))K 2 {Rt-i/{n+ 
l))] 2 is uniformly integrable, which (in view of the invariance of Tyler's esti- 
mator of scatter under permutations of the residuals) implies that [Ki(Rt/(n + 
l))K2(Rt~i/(n + l))] 2 also is. The L 2 convergence in (32) follows. 



Summing up, we have shown that (n — i) 1 / 2 [(C^ ® (C 



Tyl 



'Tyl J 



-Vecfg^o) 



(E- 1 / 2 0S /1 / 2 )vecrg. SJ (0 o )] is o qm (l) as n 
proof, since, from a multivariate application of Slutsky's theorem, 



oo. This concludes the 



(n 



>l/2r^(") 



! [(c Tyl 

under TiW (0 O , S,/). □ 



(c$))- 



vecr^(0 o ) 



s /1 /2 )vec r^(0 o )] = op (i), 



Proof of Proposition 3. Under W(")(0 O , /), one can use the same 
argument as in Lemma 4.12 in Garel and Hallin (1995). The result under 
the sequence of alternatives is obtained as usual, first establishing the joint 



normality of S 



(n) 



(9q) and L 



(n) 

o +n-V2 T /0o;S,/ 



under 7i^ n \e , S, /), then 



applying Le Cam's third Lemma; the required joint normality easily follows 
from a routine application of the classical Cramer-Wold device. □ 



A.3. Proofs of Propositions 4 and 5. 



Proof of Proposition 4. (i) Let {* 



(i) 



^^jand^, 



(po+go) 



be two fundamental systems of solutions associated with D(L). The vector 
structure of the space of solutions of D(L)x t = 0, x t G M fc implies that, 
for all j = 1, . . . ,po + qo, there exists a k(po + qo) x k matrix Aj such that 



3> 



U) 



that 



(i) 



(po+go)- 



Aj. Letting A := (Ai, 



/$(!), 



(1) 
7T+2 



(PO+9o) ' 
n+l 

(po+go) 

7T+2 



V ^ m. 



^ rn. f 



(1) 

7T+2 



; iV po+<?oJ> 



•T,(P0+90) \ 

(po+<?o) 



7T+2 



this implies 



A. 



vp(Po+*)) / 



so that 3> m = \I/. m (A (g) 1^), where <& m is the equivalent of \l/ m , but com- 



puted from the 3? J s. Thus, with obvious notation, Qg ^ 



Q&A for 



} 
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all to, where A := (o fe2 a®^.)' yielding (note that since the s and «&jr^'s 
constitute fundamental systems, A, and hence A, are nonsingular) 

^•(^(J^J-'T^Cflo) = [TS(0o)A][A / j(^ ; ^A]^[A / T^(0 o )] 

= T^(0 )(J^)- 1 T^(fl ), 

as was to be proved. The statement about the dependence on p\ and q\ 
is trivial, since T^^o), J as well as ttq, depend on p\ and q\ only 
through 7T. 

(ii) Letting 

n^T%{0 ) := Q W'((n - l^facf & ;SS (0o))', • • ■ , (vecf £!i;*; S (*o))')', 
with 

' ' y \n — i M \ 7i+l / V n+1 

\ t=i+i 

xwf'(fl„)wae„))(cg)^ 

one can verify (proceeding as for the first term in the decomposition argu- 
ment in the proof of Proposition 2) that n l / 2 {T { ^ {6 ) - T^ S (0 O )) tends 

to zero in quadratic mean as n — » oo under U/^ (n) ( o, S, /). This entails 
that 

q£ } («o) 



E[K 2 (^)]E[K|([/)] 
x (n^T^^o))'^)- 1 ^ 1 / 2 ^^)) + op(1) 

is asymptotically invariant with respect to under U/^ n H^o> ^> /)) 

since n 1//2 T^.' ) s (0o) and are strictly invariant with respect to the same 

group. °' 

(iii), (iv) Proposition 2 and the multivariate Slutsky theorem show that 

qP(0 ) has the same asymptotic behavior [under 7i.( n \6o, X, /), as well as 
under the sequence of local alternatives 7i.( n \9o + n _1 / 2 r, S, /)] as 



A: 



2 



1/2 T&k/( o)) , J - 1 , S (- 1/2 T^ ; ) s , / (0o)), 



E[^ 2 (C/)]E[K 2 (C/)] 

where n 1 / 2 T^ ; ) s / (0 o ) := Qefsi-i^Ej^o) [see (16)]. Now, Proposition 3 
and a classical result on triangular arrays [Brockwell and Davis (1987), 
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Proposition 6.3.9] imply that ra 1 / 2 T^ s j(#o) is asymptotically fc 2 7To-variate 
normal, with mean under H^ n \e , S, /), and mean 

^D k (K 2 ;f)C k (Ki;f)Je o ^P 0o M eo T 

under H {n) {0 o + n~ l / 2 T, £, /), and with covariance matrix (E[i"T 2 ([/)] x 
E[K%(U)]/k 2 )J eo ^ under both. The result follows. 

(v) It follows from Le Cam [(1986), Section 11.9] and the LAN property 

(n) 

in Proposition 1 that the test (p^ * rejecting the null hypothesis whenever 

A^(0 o )(r s , A (0o))-Ag A (0 o ) > xh- a , 

where A - denotes any arbitrary generalized inverse of A and s := rank(rs,/ 1 . (do)), 
is locally and asymptotically maximin, at probability level a, for Ti^ (6q, XI, /*) 
against Ue^e W (n) (0, /*). Note that rank(r E)/+ (0 o )) = rank(M , 0( P , 0o x 
Je o ,sP0 o M 0o ) = min(fc 2 (pi + gi),fc 2 7r ) = k 2 ir , since M 0o , P 0O and J 0OiE 
have maximal rank. Of course, the same optimality property holds for the 
asymptotically equivalent [under H.( n \0o, S, /*), as well as under contiguous 

alternatives] test <f>^ that rejects the null hypothesis whenever 
AP f [(6 )(tfJ(e )rAP u (e ) > xko,!-.- 

where A^(0 O ) := n^M^P^T^o), with := ipf^F^ 1 and tf 2 = 
i^ 1 , and 

rtVo) := ^ 1;A/ * M^ P^ jW £ p eo M eo = r B|A (g ) + op(1) 

under H< n) (0 O ,E, /*). But, in view of Lemma 2.2.5(c) of Rao and Mitra 
(1971), 

<(<W)-Af| t (0„) 



"E[(^(F- fc 1 (C/))) 2 ]E[(F^ 1 ([/)) 2 ] V UA 
</>^ and </>^, thus, are the same test. The result follows. □ 

Proof of Proposition 5. (i) Model (1) under H.( n \0 ) can be written 
in the form 

MA(L)M- 1 MXi = MB(L)M" 1 Me i , 
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where M is an arbitrary full-rank k x k matrix. This null hypothesis is thus 
invariant under the group of affine transformations £% \—t Mej if and only if 
MAjM -1 = Aj for all % = 1, . . . ,p and MBjM 1 = for all j = 1, . . . , q , 
that is, iff each Aj and each Bj commutes with any invertible matrix M, 
which holds true iff they are proportional to the k x k identity matrix. 

(ii) Let M be some nonsingular k x k matrix. For any statistic T = 
T(X M po+1 , . . . , Xl n) ), write T(M) := r(MX^ 0+1 , . . . , Mxi n) ). It follows from 

Lemma 3 and from the equivariance properties of Gjyj that r^-(M) = 

M'-^Jm'. Hence, S^(M) = [I n _i g) (M g> M'" 1 )]!^. In the same way, 

[I„„ 1 ®(S (n) (M)®(S (n) (M))- 1 )] 

= g) (M g) rVT 1 )][!„„! (S W (E^) _1 )][I n _i g) (M g) M'" 1 )]'. 

Now, A, = ajlfc clearly implies that the Green matrices of the operator A(L) 
all are proportional to the identity matrix. The same property holds for 
B(L). It is then easy to verify that the operator D(L) also is scalar (meaning 
that Dj is proportional to the identity matrix for all i = 1, . . . , po + qo). 
This implies that the fundamental system of solutions provided by Green's 
matrices of D(L) contains only matrices that are proportional to the identity 
matrix. Hence, Q^J = W"( n ) g> I k 2 for some (n — 1) x ttq matrix W^. It 
follows that 

[I n _x g) (M (g M'" 1 )]' = Q^iU ig (M (g M'- 1 )]', 
which entails T^(M) = [1^ «(M« M /_1 )]T^ and 

J^Um) = [1^ (g (M (g M'" 1 )] J^Ul^ (g (M (g M'" 1 )]'. 

Consequently, Q J } (M) = q£ } . □ 
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